Photophysical image analysis: Unsupervised probabilistic thresholding for images from electron-multiplying charge-coupled devices

We introduce the concept photophysical image analysis (PIA) and an associated pipeline for unsupervised probabilistic image thresholding for images recorded by electron-multiplying charge-coupled device (EMCCD) cameras. We base our approach on a closed-form analytic expression for the characteristic function (Fourier-transform of the probability mass function) for the image counts recorded in an EMCCD camera, which takes into account both stochasticity in the arrival of photons at the imaging camera and subsequent noise induced by the detection system of the camera. The only assumption in our method is that the background photon arrival to the imaging system is described by a stationary Poisson process (we make no assumption about the photon statistics for the signal). We estimate the background photon statistics parameter, λbg, from an image which contains both background and signal pixels by use of a novel truncated fit procedure with an automatically determined image count threshold. Prior to this, the camera noise model parameters are estimated using a calibration step. Utilizing the estimates for the camera parameters and λbg, we then introduce a probabilistic thresholding method, where, for the first time, the fraction of misclassified pixels can be determined a priori for a general image in an unsupervised way. We use synthetic images to validate our a priori estimates and to benchmark against the Otsu method, which is a popular unsupervised non-probabilistic image thresholding method (no a priori estimates for the error rates are provided). For completeness, we lastly present a simple heuristic general-purpose segmentation method based on the thresholding results, which we apply to segmentation of synthetic images and experimental images of fluorescent beads and lung cell nuclei. Our publicly available software opens up for fully automated, unsupervised, probabilistic photophysical image analysis.

where we switched the order of the integral and the sum, used the fact that characteristic function of the Dirac delta function is equal to 1 and the definition of the Gamma function.
Similarly, since the number of output electrons n oe is a hidden parameter, we again marginalize to obtain e ip(nic−noe/f −∆) dn ic dn oe (S.7) = e ip∆ e i(p/f ) noe exp − which is just the product of the characteristic function for n oe and characteristic function for Gaussian distribution of mean ∆ and standard deviation r.On the second line above, we changed the lower integration limit over the integral over n ic to minus infinity, which is justified if ∆ is large.Finally, to statistically include the effect of rounding, we approximate the rounded image counts, n icr , by adding a uniformly distributed random number between −1/2 and 1/2 to n ic .The characteristic function for n icr is then obtained by multiplying e ipnic by the characteristic function of the uniform distribution , i.e.
Numerical Fourier-inversion of this CF give the image count PMF and CDF (see Sec. S2).
If instead the electron multiplication is turned off, an identical calculation, but now using the associated expression for p(n oe |n ie , g) for the no gain case (see Eq. ( 1) in the main text), yields: As a consistency check of our image segmentation method, we calculate p-values for the summed image counts in each detected region.This calculation requires the PMF and CDF for summed image counts.Therefore, let us consider the summed image counts from m independent pixels, N = m j=1 n (j) icr .Since the characteristic function for the sum of independent random numbers is the product of the individual characteristic functions, we have:

S2: Inversion of Characteristic function
The Fourier-inverse of the characteristic function is based on the approach in [3] (see also [4]) .We use the Gil-Pelaez inversion formula [5,6]  where (f ) denotes the real part of f .Here the integral is taken up to π since the underlying random variable Y is discrete.We approximate Eq. (S.13) by using a trapezoidal quadrature to get w j e −ipj y e ipj nicr (S.14) and p j = jδ p with j = 0, . . ., N .Here, the interval (L, U ) specifies the range for the values of n icr , here estimated to be Fluorescent beads (250 nm diameter green polystyrene, CV of 5%, Thermo Fisher Scientific, MA, USA) were diluted 100 times in purified water (Milli-Q) and pipetted on-to a standard glass slide (76x26 mm, soda-lime, Menzel Gläser, Thermo Fisher Scientific).The dispersion was sealed with a coverslip (No. 1.5H coverslip, Marienfeld, Lauda-Königshofen, Germany) and nail polish to prevent evaporation.
Human lung adenocarcinoma A549 cells (ECCC, distributor Sigma Aldrich, MO, USA) were cultured in Ham's F12K media (Gibco), fixated under paraformaldehyde and then stained with Hoechst33342 (Sigma Aldrich) and AlexaFluor 488 Phalloidin (Thermo Fisher Scientific) which stains the cytoskeleton and is not imaged here.More detailed descriptions of the sample preparation can be found in the study by Abariute et al. [1]

S4: Synthetic images
The synthetic images used to test the performance of our segmentation method were generated by first placing beads with a fixed radius (10 pixels for 100% zoom, i.e. around 1 micron) at random positions in an image.To make sure that the ground truths were well-defined, no optical smearing was added.The associated black and white image was then transformed by adding background noise to generate a photon image following the EMCCD noise model [2] and using the noise model chipParams parameters for the "Gain 100" in Table 1 in the main text.In practice, we calculate Poisson-Gamma distributed random numbers for each pixel (if pixel is a signal pixel, then we use λ sig + λ bg as Poisson distribution parameter, and λ bg for background pixel), and then add normally-distributed "read-out" noise with mean ∆ and variance r 2 .Finally, the image counts are rounded to the nearest integer.The parameter λ sig was chosen based on the required signal-to-noise ratio (which ranged from 3 to 10) and derived from the SNR formula introduced in the main text, which gives (S.17) S5: Determining the chip parameters for synthetic and experimental calibration data In Fig. S1 we show results of our method for determining an estimate of the gain g and the analog-to-digital conversion factor for three different gain settings (as taken from the settings of the camera), 50, 100, and 300.In The mean and variance for pixels randomly drawn from experimental image stacks is shown for the four different gain settings from the camera (0, 50, 100, and 300).The dashed lines represent the linear fits used to calculate the parameter estimates given in Table 1 in the main text.We made 2500 draws in total and each draw had 1000 pixels.(c-d) Distributions of estimated (fitted) gain parameters g as well as 1/f.Notice that the estimated parameters have rather narrow and symmetric distributions.
the estimated chip parameters for synthetic data.Notice the similarity of Figures S1 and S2 The mean and variance for pixels randomly drawn from synthetic image stacks is shown for the four different gain settings from the camera (0, 50, 100, and 300).Our procedure for generating synthetic images is described in Sec.S4.The dashed lines are the linear fits used to calculate the parameter estimates with a procedure described in the main text.We made 2500 draws in total and each draw had 1000 pixels.(Bottom) Distributions of estimated (fitted) gain parameters g as well as the inverse analog-to-digital conversion factor.Notice that the parameter distribution obtained from the synthetic images are similar to the distribution obtained from real images, compare to Figure 1 in the main text.

S6: Definition of statistical quantities for validation
In here we define statistical observables used in our photophysical image analysis pipeline and also utilized for quantifying the accuracy of our pipeline.Whenever a threshold is imposed on an image count histogram there will be errors.To quantify these errors, we refer to a black pixel (given some intensity threshold) as a "negative" and a white pixel as a "positive".There will then be four types of pixels (i) true negatives (TN) = background pixels which are correctly classified as black pixels; (ii) false negatives (FN) = signal pixels which are classified as black; (iii) true positives (TP) = signal pixels which are correctly classified as white; (iv) false positives (FP) = background pixels which are classified as white, see also Table S1.
Estimated/Ground truth background signal black true negative (TN) False negative (FN) white false positive (FP) true positive (TP) Table S1.Notation used when classifying pixels.
The difference between the four types of pixels are illustrated in Fig. S3.Note that the classification of individual pixels according to the four labels require us to have ground truths available for all pixels.In the absence of ground truth, 5/13 we can, however, still make statistical predictions on the error in classifications through the method introduced in the main text.An image in general consists of both signal pixels (positives) and background pixels (negatives).By applying a threshold N thresh icr to the image counts N icr for each pixel one seeks to recover the signal and background pixels.However, in this process one typically makes errors, leading to FPs and FNs.a) Schematic illustration of image counts for the background pixels in an image.b) Schematic illustration of the image counts for signal pixels in an image.c) Schematic illustration of the image counts in an image which contains both background and signal pixels.

HISTOGRAM FOR ALL PIXELS
Based on the TN, FN, TP and FP, a number of statistical measures are commonly used.We here relate these such observables to the conditional probabilities: p(white|bg) and p(black|s) introduced in the main text.Notice that we have the normalization conditions: p(black|bg) + p(white|bg) = 1 and p(black|s) + p(white|s) = 1.We have and can be estimated through: where f Bg is the ratio of background pixels, which we estimate through f Bg = (n bg + 1)/(m + 2 S7: Estimating λ bg and N bg icr using truncated fits and a goodness-of-fit test procedure In the Methods section in the main text, we introduce our procedure for estimating the Poisson parameter, λ bg for background regions in an image and the "optimal" truncation point, N bg icr .Here, we illustrate the output of our procedure for synthetic images and for the two experimental images in the main text.Here, we used 5 bins when calculating the χ 2 score, with bin edges determined by largest image count values for which the histogram of all image counts up to the edge were lower than 0.2, 0.4, 0.6 and 0.8 quantile.When the χ 2 values are below the horizontal lines, the fit is deemed as good (here, with a goodness-of-fit p-value, p GoF = 0.01).As "optimal" truncation point, N bg icr , we use the largest truncation point which passed the goodness-of-fit test.We used the following SNR values: top row, SNR = 3, middle row, SNR = 4.5 and bottom row, SNR = 6 when generating the synthetic images.8/13

0 e 0 e
p(n ie |λ) = λ nie n ie ! e −λ , (S.1) where we have the normalization condition: ∞ nie=0 p(n ie |λ) = 1.The conditionals for the outgoing electrons n oe with gain g is given in Eq. (1) in the main text, where we have a normalization condition: ∞ 0 p(n oe |g, λ)dn oe = 1.We may marginalize out the parameter n ie such that p(n oe |g, λ) = ∞ nie=0 p(n oe |n ie , g)p(n ie |λ).(S.2) We then recover a simple form of the characteristic function for n oe : e ipnoe = ∞ ipnoe p(n oe |g, λ)dn oe = ∞ ipnoe δ(n oe ) +

FigFig
Fig S1.Determining the chip parameters through a set of calibration experiments.(a-b)The mean and variance for pixels randomly drawn from experimental image stacks is shown for the four different gain settings from the camera (0, 50, 100, and 300).The dashed lines represent the linear fits used to calculate the parameter estimates given in Table1in the main text.We made 2500 draws in total and each draw had 1000 pixels.(c-d) Distributions of estimated (fitted) gain parameters g as well as 1/f.Notice that the estimated parameters have rather narrow and symmetric distributions.

Fig
Fig S2.Estimating the chip parameters using synthetic calibration data.(Top) The mean and variance for pixels randomly drawn from synthetic image stacks is shown for the four different gain settings from the camera (0, 50, 100, and 300).Our procedure for generating synthetic images is described in Sec.S4.The dashed lines are the linear fits used to calculate the parameter estimates with a procedure described in the main text.We made 2500 draws in total and each draw had 1000 pixels.(Bottom) Distributions of estimated (fitted) gain parameters g as well as the inverse analog-to-digital conversion factor.Notice that the parameter distribution obtained from the synthetic images are similar to the distribution obtained from real images, compare to Figure1in the main text.

Fig
Fig S3.Definition of true negatives (TN), false negatives (FN), false positives (FP) and true positives (TP).An image in general consists of both signal pixels (positives) and background pixels (negatives).By applying a threshold N thresh

Fig
Fig S4.Estimation of λ bg and N bg icr for synthetic images of varying SNR ratios.a) Synthetic images (see Sec. S4).b) Image count histograms for the images in panel a) with an overlaid fitted EMCCD-PMF with λ bg and truncation points as fit parameters.c) χ 2 scores for different image count truncation points.Here, we used 5 bins when calculating the χ 2 score, with bin edges determined by largest image count values for which the histogram of all image counts up to the edge were lower than 0.2, 0.4, 0.6 and 0.8 quantile.When the χ 2 values are below the horizontal lines, the fit is deemed as good (here, with a goodness-of-fit p-value, p GoF = 0.01).As "optimal" truncation point, N bg icr , we use the largest truncation point which passed the goodness-of-fit test.We used the following SNR values: top row, SNR = 3, middle row, SNR = 4.5 and bottom row, SNR = 6 when generating the synthetic images.

Fig
Fig S5.Estimates of N bg icr and λ bg as well as goodness-of-fit statistics for the for the experimental image of fluorescent beads (Figure 1 in the main text).(a) Estimated value for the "optimal" threshold, N bg icr for each of the tiles in Figure 1 in the main text.(b) Estimates of λ bg with a truncation point at N bg icr .(c) Goodness-of-fit χ 2 scores with a truncation point at N bg icr .d) Tiles which passed the χ 2 test at 1% significance level.Here, a value = 1 correponds to a passed test, and a value = 0 indicate that the test was not passed.

FigsFig
Figs. S9 and S10 provide further examples of our binarization and segmentation methods.

FigFigFig
Fig S7. Background Poisson parameter estimation and image thresholding for the lung cancer cell images.(a) Lung cancer cell image example.(b) Image count histogram with an overlaid fit obtained using p-value threshold, p GoF = 0.01.
All images were taken with an inverted Nikon Eclipse Ti microscope (Nikon Corporation, Tokyo, Japan), an Andor iXon DU-897 EMCCD camera (Andor Technology, Belfast, Northern Ireland) and Lumencor SOLA light engine (Lumencor Inc, OR, USA).The following Nikon objectives were used: 100x oil immersion (NA=1.4,Plan Apo VC), 20x air (NA=0.75,Plan Apo λ OFN25).FITC filter cube (EX 482/35 DM 505, EM 536/40) was used for the beads and DAPI filter cube (EX 387/11, DM, EM 447/60) was used for the lung cancer cells.The camera exposure time was 10 ms for all images and images in a video stack were captured with 13.76 frames per second.